Multi-seed Lossless Filtration (Extended Abstract)
نویسندگان
چکیده
We study a method of seed-based lossless filtration for approximate string matching and related applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen [1]. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.
منابع مشابه
Spaced Seeds Design Using Perfect Rulers
We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of ...
متن کاملLossless Seeds for Searching Short Patterns with High Error Rates
We address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P , find all locations in T that differ by at most k errors from P . For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically w...
متن کاملFast Noise Suppression for Lossless Image Coding ( Extended Preprint )
1 FAST NOISE SUPPRESSION FOR LOSSLESS IMAGE CODING (EXTENDED PREPRINT) Tilo Strutz University of Rostock, Institute of Communications and Information Electronics Richard-Wagner-Str.31, 18119 Rostock, FRG ABSTRACT: This contribution presents a new denoising method for applications requiring preservation of highest visual image quality. The aim of the new approach is not to suppress all noise ins...
متن کاملRecherche de similarités dans les séquences d'ADN : modèles et algorithmes pour la conception de graines efficaces
Most commonly used similarity search methods in genomic sequences are heuristic ones.These are based upon text ltering that allows one to infer potential regions of similarity. Thisthesis proposes new lter de nitions to search for similarities in genomic sequences, and fastalgorithms to measure the e ciency of these lters.More precisely, we study the spaced seed model and propos...
متن کاملThe Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach
Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...
متن کامل